A Framework For Proactive Fault Tolerance12
نویسندگان
چکیده
Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/restart or duplication. However it is also possible to anticipate failures and proactively take action before failures occur in order to minimize failure impact on the system and application execution. This document presents a proactive fault tolerance framework. This framework can use different proactive fault tolerance mechanisms, i.e., migration and pause/unpause. The framework also allows the implementation of new proactive fault tolerance policies thanks to a modular architecture. A first proactive fault tolerance policy has been implemented and preliminary experimentations have been done based on system-level virtualization and compared with results obtained by simulation.
منابع مشابه
1 Proactive Fault - Recovery in Distributed Systems
Supporting both real-time and fault-tolerance properties in systems is challenging because real-time systems require predictable end-to-end schedules and bounded temporal behavior in order to meet task deadlines. However, system failures, which are typically unanticipated events, can disrupt the predefined real-time schedule and result in missed task deadlines. Such disruptions to the real-time...
متن کاملProactive Fault-Tolerant Model Predictive Control
Fault-tolerant control methods have been extensively researched over the last 10 years in the context of chemical process control applications, and provide a natural framework for integrating process monitoring and control aspects in a way that not only fault detection and isolation but also control system reconfiguration is achieved in the event of a process or actuator fault. But almost all t...
متن کاملA proactive fault tolerance framework for high performance computing (HPC) systems in the cloud
As high-performance computing (HPC) systems continue to increase in scale, their mean-time to interrupt decreases respectively. The current state of practice for fault tolerance (FT) is checkpoint/restart. However, with increasing error rates, increasing aggregate memory and not proportionally increasing I/O capabilities, it is becoming less efficient. Proactive FT avoids experiencing failures ...
متن کاملArchitecting Dependable Systems with Proactive Fault Management
Management of an ever-growing complexity of computing systems is an everlasting challenge for computer system engineers. We argue that we need to resort to predictive technologies in order to harness the system’s complexity and transform a vision of proactive system and failure management into reality. We describe proactive fault management, provide an overview and taxonomy for online failure p...
متن کاملProactive Service Migration for Long-Running Byzantine Fault Tolerant Systems
In this paper, we describe a novel proactive recovery scheme based on service migration for long-running Byzantine fault tolerant systems. Proactive recovery is an essential method for ensuring long term reliability of fault tolerant systems that are under continuous threats from malicious adversaries. The primary benefit of our proactive recovery scheme is a reduced vulnerability window. This ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008